Improved Portability And Parsing Through Interactive Acquisition Of Semantic Information

نویسندگان

  • François-Michel Lang
  • Lynette Hirschman
چکیده

This paper presents SPQR (Selectional Pattern Queries and Responses), a module of the PUNDIT text-processing system designed to facilitate the acquisition of domain-specific semantic information, and to improve the accuracy and efficiency of the parser. SPQR operates by interactively and incrementally collecting information about the semantic acceptability of certain lexical co-occurrence patterns (e.g., subject-verb-object) found in partially constructed parses. The module has proved to be a valuable tool for port-ing PUNDIT to new domains and acquiring essential semantic information about the domains. Preliminary results also indicate that SPQR causes a threefold reduction in the number of parses found, and about a 40~ reduction in total parsing time. A major concern in designing a natural-language system is portability: It is advantageous to design a system in such a way that it can be ported to new domains with a minimum of effort. The level of effort required for such a port is considerably simplified if the system features a high degree of modularity. For example, if the domain-independent and domain-specific components of a system are clearly factored, only the domain-specific knowledge bases need be changed when porting to a new domain. Even if a system demonstrates such separation, however, the problem remains of acquiring this domain-specific One obvious benefit of acquiring domain-specific semantic information is rejecting parses generated by the syntactic component which are semantically anomalous. Using domain knowledge to rule out semantically anomalous parses is especially important when parsing with large, broad-coverage grammars such as ours: Our Prolog implementation of Restriction Grammar ~-Iirschman1982,Hirschman1985] includes about 100 grammar rules and 75 restrictions, and is based on Sager's Linguistic String Grammar [Sager1981]. It also includes a full treatment of sentential fragments and telegraphic message style. As a result of this extended coverage, many sentences receive numerous syntactic analyses. A majority of these analyses, however, are incorrect because they violate some semantic constraint. Let us take as an example the sentence High lsbe oil temperatsre belle~ed contribstor to snlt failure. Two of the parses for this sentence could be paraphrased as: (1) The high lube oil temperature believed the contributor to the unit failure. (2) The high lube oil temperature was believed to be a contributor to the unit failure. but our knowledge of the domain (and common sense) tells us that the first parse is wrong, since temperatures cannot hold beliefs. It is only because of this semantic information that we know …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

برچسب‌زنی خودکار نقش‌های معنایی در جملات فارسی به کمک درخت‌های وابستگی

Automatic identification of words with semantic roles (such as Agent, Patient, Source, etc.) in sentences and attaching correct semantic roles to them, may lead to improvement in many natural language processing tasks including information extraction, question answering, text summarization and machine translation. Semantic role labeling systems usually take advantage of syntactic parsing and th...

متن کامل

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

Semi-automatic acquisition of domain-specific semantic structures

This paper describes a methodology for semi-automatic grammar induction from unannotated corpora belonging to a restricted domain. The grammar contains both semantic and syntactic structures, which are conducive towards language understanding. Our work aims to ameliorate the reliance of grammar development on expert handcrafting or the availability of annotated corpora. To strive for a reasonab...

متن کامل

Chinese spoken language understanding across domain

A robust parsing model for spontaneous Chinese based on semantic constituent spotting and concept assembling model (SCAM) had been successfully developed in our “LOADSTAR”dialog system[1]. It is a travel information accessing system and the SCAM is rule based. Considering the domain portability, a statistical model for spoken language understanding is adopted. The statistical spoken language un...

متن کامل

Acquisition of English Prenominal and Postnominal Genitives

This study examined the acquisition of prenominal and postnominal genitives by Iranian EFL learners. Two variables were considered: possessive categories and language proficiency. We considered the influence of possessive categories such as lexical modifier, semantic relationship, and weight and syntactic complexity on genitive alternations by Iranian EFL learners. Also, we examined whether the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1988